A neural network is a model inspired by how the human brain works. It consists of “neurons” (also called nodes) connected in layers. In a Multi-Layer Perceptron (MLP), the network is organized into different types of “layers”:
Input layer: Receives the input data (e.g., the predictors)
Hidden layer(s): Layers between the input and output layers, where the network learns to detect patterns in the data.
Output layer: Produces the final prediction. For classification problems, the output layer typically contains neurons representing each class (e.g., “stay” or “leave”), and the network outputs a probability for each class. For regression problems, there’s often a single neuron in the output layer, providing a continuous value.
How Neurons Process Inputs
In a neural network, each “neuron” (or node) in the network receives some inputs, performs calculations, and then decides (1) if the neuron will “fire” (i.e., whether or not to pass its output on to the next layer, and (2) how much of the signal to send.
Each neuron receives inputs from the previous layer (or, in the case of the input layer, the original input data), multiplies each input by a weight that represents its importance, and adds a bias term. The neuron then sums these weighted inputs, and the result is passed through an activation function to produce the neuron’s output, which will be sent to the neurons in the next layer.
Hidden Layers
The hidden layers are where the network learns and detects patterns. These layers are often referred to as the “black box” of the network because we can’t directly observe the patterns they’re learning.
In research terms, the hidden layers essentially act as “mediators”; they take in the inputs and don’t just pass the information to the output, but instead “mediate” by processing, transforming, and combining the inputs to build complex representations that can help make the final prediction.
Just as mediators in research help us understand the process behind a relationship, hidden layers help the MLP capture and represent complex underlying processes in the data. They can be thought of as extracting intermediate representations that explain part of the relationship between input data and output prediction. Each hidden layer builds on the patterns detected by the previous layer, making it possible to capture more complex and non-linear relationships.
You can have more than one hidden layer, effectively creating multiple “mediator steps” in the relationship. Each additional layer allows the network to learn and represent more abstract patterns, enabling it to capture even more complex relationships in the data.
Non-Linearity and Activation Functions
An important part of neural networks is that they don’t assume linearity in the underlying relationship. By stacking multiple layers and using non-linear activation functions, an MLP can capture complex, non-linear relationships in the data, allowing it to model a wider range of patterns.
Activation function: An activation function is a mathematical function applied to the output of each neuron (node) in a neural network layer. After a neuron receives inputs, it calculates a weighted sum (adding together the inputs multiplied by their weights and adding a bias). This weighted sum is then passed through an activation function, which determines the neuron’s output.
The purpose of the activation function is to introduce non-linearity into the model. Without it, the entire network would behave like a single linear transformation, regardless of how many layers it has. By using activation functions, we allow the network to learn and represent non-linear relationships, which are common in real-world data.
Training the Neural Network
Imagine a neural network is making predictions, like guessing whether an employee will stay or leave based on certain inputs. When the network makes a guess, we can compare the guess to the actual answer and see how far off it was. This difference is called the error.
The process of training a neural network involves reducing this error by adjusting the importance (or “weights”) the network assigns to each input feature. This is done through a process called backpropagation. The network works backwards through its layers to figure out which weights contributed most to the error and need adjusting. Then it adjusts the weights to reduce the error.
This process of making predictions, checking errors, and adjusting weights is repeated many times across the dataset. Each time, the network improves slightly, learning which inputs are more important for accurate predictions.
Over many rounds, the network gets better at making predictions because it has fine-tuned the weights, allowing it to minimize the error and better understand the relationship between inputs and the desired output.